Ensemble methods for offline handwritten text line recognition
نویسنده
چکیده
This thesis investigates ensemble methods for offline recognition of English handwritten text lines. Multiple recognisers are automatically generated from a single base recognition system. Combining the output of these multiple recognisers provides the final ensemble result. The underlying recognisers are based on hidden Markov models. One model is built for each character. Based on the lexicon, word models are derived by concatenating character models. A statistical language model is used to build text line models by preferring more likely word sequences over unlikely word sequences. A postprocessing step calculates confidence values for each recognised word. Ensembles of recognisers are generated based on variation of the training data, the features, and the system architecture. Because the output of a handwritten text line recogniser is a sequence of words, most existing combination methods cannot be applied directly. The combination has to be performed in two steps. First, the word sequences are synchronised by a string alignment procedure. Second, a decision strategy derives the combination result for each segment of the alignment. For this purpose, confidence-based voting, a statistical decision method, and a decision method that includes language model information are used. The experimental evaluation on a large set of images of handwritten text lines indicates that the proposed ensemble methods can significantly increase the performance of an offline handwritten text line recognition system.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملUse of the Shearlet Transform and Transfer Learning in Offline Handwritten Signature Verification and Recognition
Despite the growing growth of technology, handwritten signature has been selected as the first option between biometrics by users. In this paper, a new methodology for offline handwritten signature verification and recognition based on the Shearlet transform and transfer learning is proposed. Since, a large percentage of handwritten signatures are composed of curves and the performance of a sig...
متن کاملExperiments in Unconstrained Offline Handwritten Text Recognition
A system for off-line handwritten text recognition is presented. It is characterized by a segmentation-free approach, i.e. whole lines of text are processed by the recognition module. The methods used for pre-processing, feature extraction, and statistical modelling are described, and several experiments on writer-independent, multiple writer, and single writer handwriting recognition tasks are...
متن کاملOff-line Handwritten Arabic Character Recognition: A Survey
The automatic recognition of text on scanned images has several applications such as automatic postal mail sorting and searching in large volume of documents. Although Arabic handwritten text recognition has been addressed by many researchers, it remains a challenging task due to several factors. This paper presents an overview of off-line handwritten Arabic character recognition and summarizes...
متن کاملKHATT: An open Arabic offline handwritten text database
A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008